Skip to content

BUG: Bug in csv parsing when passing dtype and names and the parsed data is a different data type (GH8833) #8834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 17, 2014

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Nov 16, 2014

closes #8833

In [5]: data = """1.0,a
   ...: nan,b
   ...: 3.0,c
   ...: """

In [8]: read_csv(StringIO(data),sep=',',names=['A','B'])
Out[8]: 
    A  B
0   1  a
1 NaN  b
2   3  c

In [9]: read_csv(StringIO(data),sep=',',names=['A','B'],dtype={'A' : int})
ValueError: cannot safely convert passed user dtype of <i8 for float64 dtyped data in column 0

@jreback jreback added Bug IO CSV read_csv, to_csv labels Nov 16, 2014
@jreback jreback added this to the 0.15.2 milestone Nov 16, 2014
expected = DataFrame([[1,1],[2,2],[3,3]],columns=['a','b'])
tm.assert_frame_equal(result, expected)

data = """
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mrocklin
@cpcloud

this case actually raises internally (e.g. coercing a float -> int), but then I ignore and don't cast.

So should this raise? (basically the user is requesting an int, but cannot cast to it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think if you request an int when there's clearly a nan in the column then it should raise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other alternative is to ignore the user, but that doesn't seem very polite

@jreback jreback force-pushed the parser branch 2 times, most recently from 552ed84 to fc54c8a Compare November 17, 2014 02:21
@jreback
Copy link
Contributor Author

jreback commented Nov 17, 2014

@cpcloud
ok, I put a more informative error message up (so your original example will pass thru just fine, converting to int, this is a non-convertible case)

@jreback
Copy link
Contributor Author

jreback commented Nov 17, 2014

Course numpy is annoying (this is 1.9.1). You would think that the casting a == float to an int is safe :)

In [3]: np.array([1.0,2.0,3.0]).astype('i8',casting='unsafe')
Out[3]: array([1, 2, 3])

In [4]: np.array([1.0,2.0,3.0]).astype('i8',casting='safe')
TypeError: Cannot cast array from dtype('float64') to dtype('int64') according to the rule 'safe'

In [5]: np.array([1.0,np.nan,3.0]).astype('i8',casting='unsafe')
Out[5]: array([                   1, -9223372036854775808,                    3])

In [6]: np.array([1.0,np.nan,3.0]).astype('i8',casting='safe')
TypeError: Cannot cast array from dtype('float64') to dtype('int64') according to the rule 'safe'

cc @mwiebe

@mwiebe
Copy link
Contributor

mwiebe commented Nov 17, 2014

The tricky thing here is that NumPy only has the types to make this determination, not the values, so from the perspective of only knowing that it is float -> int, it is not a safe cast. The error mode in dynd is an attempt to provide a value-based way to handle this, by inserting code to check the values as float is converted to int.

@jreback
Copy link
Contributor Author

jreback commented Nov 17, 2014

@mwiebe and that is exactly what I do

will be trying out dynd for real with ?int64 after pydata
when I have some time!

jreback added a commit that referenced this pull request Nov 17, 2014
BUG: Bug in csv parsing when passing dtype and names and the parsed data is a different data type (GH8833)
@jreback jreback merged commit 5e8ba36 into pandas-dev:master Nov 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: unhelpful parser exception when passing names and dtype
3 participants